Exercise 1

Load the data dataCar from the package “insuranceData”. It represents claim data on vehicle insurance policies from 2004 to 2005. Some variables like “gender” describe the policy holder, others like “veh_age” the vehicle, and some variables carry information on claims, e.g. “numclaims”. Each row represents policy information valid in a certain time window. Use the pipe, “dplyr”, and “ggplot2” to solve the following tasks.

library(tidyverse)
library(insuranceData)
library(plotly)

data(dataCar)

str(dataCar)
## 'data.frame':    67856 obs. of  11 variables:
##  $ veh_value: num  1.06 1.03 3.26 4.14 0.72 2.01 1.6 1.47 0.52 0.38 ...
##  $ exposure : num  0.304 0.649 0.569 0.318 0.649 ...
##  $ clm      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ numclaims: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ claimcst0: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ veh_body : Factor w/ 13 levels "BUS","CONVT",..: 4 4 13 11 4 5 8 4 4 4 ...
##  $ veh_age  : int  3 2 2 2 4 3 3 2 4 4 ...
##  $ gender   : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 2 1 1 ...
##  $ area     : Factor w/ 6 levels "A","B","C","D",..: 3 1 5 4 3 3 1 2 1 2 ...
##  $ agecat   : int  2 4 2 2 2 4 4 6 3 4 ...
##  $ X_OBSTAT_: Factor w/ 1 level "01101    0    0    0": 1 1 1 1 1 1 1 1 1 1 ...
head(dataCar)

a)

Draw barplots of the discrete variables “numclaims”, “agecat” (categorized driver age), and “gender”.

  dataCar %>% ggplot(mapping = aes(x = numclaims)) +
    geom_bar(fill = "navyblue")

  dataCar %>% ggplot(mapping = aes(x = agecat)) +
    geom_bar(fill = "navyblue")

  dataCar %>% ggplot(mapping = aes(x = gender)) +
    geom_bar(fill = "navyblue")

b)

Draw a histogram of the vehicle value “veh_value” (in 10’000 Australian Dollars). Truncate values above 7 (this means: if a value is larger than 7, set it to 7).

dataCar %>% mutate(veh_value = (veh_value > 7)*7 + veh_value*(veh_value <= 7)) %>% 
  arrange(-veh_value) %>% 
  ggplot(mapping = aes(veh_value)) +
  geom_histogram(fill = "navyblue")

### c)

Calculate the average number of claims per level of “agecat” and visualize the result as a scatterplot. Interpret the result.

dataCar %>%
  group_by(agecat) %>% 
  summarize(avg_claims = mean(numclaims)) %>% 
  ggplot(mapping = aes(x = agecat, y = avg_claims)) +
  geom_point(fill = "navy")

The older the owner the smaller the average claim gets. That makes sense, since younger driver may drive more reckless than older people.

d)

Bin “veh_value” into quartiles and analyze its association with the number of claims as in 1c.

summary(dataCar$veh_value)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.010   1.500   1.777   2.150  34.560
plot <- dataCar %>% mutate(veh_value_bin = ntile(veh_value, n=4)) %>% 
  group_by(veh_value_bin) %>% 
  summarize(avg_claims = mean(numclaims)) %>% 
  ggplot(mapping = aes(x = veh_value_bin, y = avg_claims)) +
  geom_point(fill = "navy")

plot

The higher the price of the car, the higher are the average claims.

e)

Use the “plotly” package to turn the plot from d. interactive.

(plot) %>% ggplotly()

Exercise 3

The sieve of Eratosthenes is an ancient algorithm to get all prime numbers up to any given limit n, see Wikipedia. Write a function sieve_of_eratosthenes(n) that returns all prime numbers up to n. Benchmark the results for n = 10^5 with the package “microbenchmark”. Mind your coding style!

sieve_of_eratostheses <- function(n){
  sieve = !logical(n)
  i = 2
  while(i <= sqrt(n)){
    if(sieve[i]){
      j = i^2
      while(j<=n){
        sieve[j]=FALSE
        j = j+i
      }
    }
    i = i+1
  }
  out = which(sieve %in% TRUE)
  out = out[out!=1]
  return(out)
}
library(microbenchmark)

res <- microbenchmark(sieve_of_eratostheses(10^5), times=100)
print(res)
## Unit: milliseconds
##                         expr     min       lq     mean   median      uq     max
##  sieve_of_eratostheses(10^5) 12.0717 17.20875 18.30951 18.05325 19.3729 28.5984
##  neval
##    100
ggplot2::autoplot(res)

Exercise 4

In Exercise 1c, we have calculated and plotted the average number of claims per level of “agecat” in the dataCar data. a. Write a function avg_claim_counts(v) that provides such a visualization for any discrete variable v. b. Extend this function with a second argument interactive to control whether the resulting plot is interactive or not.

avg_claim_counts <- function(v, interactive=FALSE){
  plot <- dataCar %>%
    group_by(across(all_of(v))) %>% 
    summarize(avg_claims = mean(numclaims)) %>% 
    ggplot(mapping = aes(x = .data[[v]], y = avg_claims)) +
    geom_point(fill = "navy")
  
  if(interactive){
    plot <- (plot) %>% ggplotly()
  }
  
  return(plot)
}

Exercise 5

Extend the “student” class from Section “plot, print, summary” by the optional information “semester”. It represents the number of semesters the student is already registered. Add a summary() method that would neatly print the name and the semester of the student.

student <- function(given_name, family_name, semester = NULL) {
  out <- list(
    given_name = given_name,
    family_name = family_name,
    semester = semester
  )
  class(out) <- "student"
  out
}

summary.student <- function(object){
  cat("Name: ", object$given_name, " ", object$family_name, "\n")
  cat("Semester: ", object$semester, "\n")
}

#other option to set a Method for a class

# setMethod("summary", "student", function(object) {
#   cat("Name: ", object$given_name, " ", object$family_name, "\n")
#   cat("Semester: ", object$semester, "\n")
# })

me <- student("Tobias", "Hugentobler", 2)
summary(me)
## Name:  Tobias   Hugentobler 
## Semester:  2